Southampton
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- Asia > Middle East > Israel (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > Middle East > Jordan (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
- North America > United States > Texas (0.14)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.13)
- North America > Canada > Ontario > Toronto (0.13)
- (44 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- (2 more...)
- Transportation > Passenger (1.00)
- Transportation > Air (1.00)
- Leisure & Entertainment (1.00)
- (25 more...)
- North America > United States > Texas (0.14)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.13)
- North America > Canada > Ontario > Toronto (0.13)
- (44 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- (2 more...)
- Transportation > Passenger (1.00)
- Transportation > Air (1.00)
- Leisure & Entertainment (1.00)
- (24 more...)
Royal Navy returns to wind power with trial of robotic sailboats
Oshen's robotic sailboats are powered by the wind and the sun The UK's Royal Navy may return to the age of sail, with a new demonstration involving a flotilla of small, wind-propelled robot boats. Made by Oshen in Plymouth, UK, the vessels, known as C-Stars, are just 1.2 metres long and weigh around 40 kilos. Solar panels power navigation, communications and sensors, while a sail provides propulsion. Deployed as a constellation, the small vessels act as a wide-area sensor network. How the US military wants to use the world's largest aircraft "The simplest way of describing C-Stars is as self-deploying, station-keeping ocean buoys," says Oshen CEO Anahita Laverack .
- North America > United States (0.50)
- Europe > United Kingdom > England > Devon > Plymouth (0.25)
- Europe > United Kingdom > England > Hampshire > Southampton (0.05)
- (2 more...)
- Health & Medicine (1.00)
- Energy > Renewable > Wind (0.51)
- Government > Regional Government > North America Government > United States Government (0.50)
- (2 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Jacob Portes
Although BERT -style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue
Qi, Qianqian, van der Heijden, Peter G. M.
Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1 constraints. Although these models are to a large extend similar or even equivalent, they are presented under different names, and their similarity is not well known. This paper highlights similarities among five popular models, latent budget analysis (LBA), latent class analysis (LCA), end-member analysis (EMA), probabilistic latent semantic analysis (PLSA), and nonnegative matrix factorization (NMF). We focus on an essential issue-identifiability-of these models and prove that the solution of LBA, EMA, LCA, PLSA is unique if and only if the solution of NMF is unique. We also provide a brief review for algorithms of these models. We illustrate the models with a time budget dataset from social science, and end the paper with a discussion of closely related models such as archetypal analysis.
- Asia > Middle East > Jordan (0.04)
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Fast Factorized Learning: Powered by In-Memory Database Systems
Stöckl, Bernhard, Schüle, Maximilian E.
Learning models over factorized joins avoids redundant computations by identifying and pre-computing shared cofactors. Previous work has investigated the performance gain when computing cofactors on traditional disk-based database systems. Due to the absence of published code, the experiments could not be reproduced on in-memory database systems. This work describes the implementation when using cofactors for in-database factorized learning. We benchmark our open-source implementation for learning linear regression on factorized joins with PostgreSQL -- as a disk-based database system -- and HyPer -- as an in-memory engine. The evaluation shows a performance gain of factorized learning on in-memory database systems by 70\% to non-factorized learning and by a factor of 100 compared to disk-based database systems. Thus, modern database engines can contribute to the machine learning pipeline by pre-computing aggregates prior to data extraction to accelerate training.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (5 more...)
Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety
Mao, Junyu, Hills, Anthony, Tseriotou, Talia, Liakata, Maria, Shamir, Aya, Sayda, Dan, Atzil-Slonim, Dana, Djohari, Natalie, Mandal, Arpan, Roth, Silke, Ugwudike, Pamela, Niranjan, Mahesan, Middleton, Stuart E.
Real-world indicators are important for improving natural language processing (NLP) tasks such as life events for mental health analysis and risky behaviour for online safety, yet labelling such information in NLP training datasets is often costly and/or difficult given the dynamic nature of such events. This paper compares several LLM-based data enrichment methods and introduces a novel Confidence-Aware Fine-Grained Debate (CFD) framework in which multiple LLM agents simulate human annotators and exchange fine-grained evidence to reach consensus. We describe two new expert-annotated datasets, a mental health Reddit wellbeing dataset and an online safety Facebook sharenting risk dataset. Our CFD framework achieves the most robust data enrichment performance compared to a range of baselines and we show that this type of data enrichment consistently improves downstream tasks. Enriched features incorporated via debate transcripts yield the largest gains, outperforming the non-enriched baseline by 10.1% for the online safety task.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- (4 more...)
Variance Matters: Improving Domain Adaptation via Stratified Sampling
Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Variance-Reduced Domain Adaptation via Stratified Sampling (VaRDASS), the first specialised stochastic variance reduction technique for UDA. We consider two specific discrepancy measures -- correlation alignment and the maximum mean discrepancy (MMD) -- and derive ad hoc stratification objectives for these terms. We then present expected and worst-case error bounds, and prove that our proposed objective for the MMD is theoretically optimal (i.e., minimises the variance) under certain assumptions. Finally, a practical k-means style optimisation algorithm is introduced and analysed. Experiments on three domain shift datasets demonstrate improved discrepancy estimation accuracy and target domain performance.
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- (3 more...)